Skip to content

[ML-42739] Add custom forecasting data splits for automl_runtime #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Jul 25, 2024

Conversation

Lanz-db
Copy link

@Lanz-db Lanz-db commented Jul 18, 2024

This PR enables custom forecasting data splits. There are three main changes.

  • ArimaEstimator
    • add split_cutoff parameter
  • ProphetEstimator
    • add split_cutoff parameter
  • utils.py
    • add generate_custom_cutoff function to generate cutoffs based on given split_cutoff, for cross_validation
  • unit tests
  • manual e2e tests
  • add comments

@Lanz-db Lanz-db requested a review from wenfeiy-db July 18, 2024 16:53
@Lanz-db Lanz-db changed the title [ML-42739] Forecasting data splits [ML-42739] Add custom forecasting data splits Jul 18, 2024
Comment on lines 207 to 213
while result[-1] <= max(df["ds"]) - horizon_dateoffset:
cutoff += period_dateoffset
if not (((df["ds"] > cutoff) & (df["ds"] <= cutoff + horizon_dateoffset)).any()):
if cutoff < df["ds"].max():
closest_date = df[df["ds"] > cutoff].min()["ds"]
cutoff = closest_date - horizon_dateoffset
result.append(cutoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add comments like those in L173-182? Would be very helpful for review and when we look back in the future. E.g. is the cutoff in result guaranteed to be an existing data point in df or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also i'm not completely sure whether simply reverting -= to += is just correct. I'll take a deeper look later

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments added.

@apeforest
Copy link

Can you add more details in the PR description

@Lanz-db Lanz-db changed the title [ML-42739] Add custom forecasting data splits [ML-42739] Add custom forecasting data splits for automl_runtime Jul 23, 2024
@Lanz-db Lanz-db requested a review from wenfeiy-db July 24, 2024 23:57
max_cutoff = max(df["ds"]) - horizon_dateoffset
while cutoff <= max_cutoff:
# If data does not exist in data range (cutoff, cutoff + horizon_dateoffset]
if (not (((df["ds"] > cutoff) & (df["ds"] <= cutoff + horizon_dateoffset)).any())) and (cutoff < df["ds"].max()):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this redundant condition as we discussed (cutoff < df["ds"].max()?

@Lanz-db Lanz-db merged commit 90a5cc3 into branch-0.2.20.1 Jul 25, 2024
5 checks passed
@Lanz-db Lanz-db deleted the Lanz-db/branch-0.2.20.1 branch July 25, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants